Combining Multiple, Large-Scale Resources in a Reusable Lexicon for Natural Language Generation
نویسندگان
چکیده
A lexicon is an essential component in a generation system but few efforts have been made to build a rich, large-scale lexicon and make it reusable for different generation applications. In this paper, we describe our work to build such a lexicon by combining multiple, heterogeneous linguistic resources which have been developed for other purposes. Novel transformation and integration of resources is required to reuse them for generation. We also applied the lexicon to the lexical choice and realization component of a practical generation application by using a multi-level feedback architecture. The integration of the lexicon and the architecture is able to effectively improve the system paraphrasing power, minimize the chance of grammatical errors, and simplify the development process substantially. 1 I n t r o d u c t i o n Every generation system needs a lexicon, and in almost every case, it is acquired anew. Few efforts in building a rich, large-scale, and reusable generation lexicon have been presented in literature. Most generation systems are still supported by a small system lexicon, with limited entries and hand-coded knowledge. Although such lexicons are reported to be sufficient for the specific domain in which a generation system works, there are some obvious deficiencies: (1) Hand-coding is time and labor intensive, and introduction of errors is likely. (2) Even though some knowledge, such as syntactic structures for a verb, is domain-independent, often it is re-encoded each time a new application is under development. (3) Hand-coding seriously restricts the scale and expressive power of generation systems. As natural language generation is used in more ambitious applications, this situation calls for an improvement. Generally, existing linguistic resources are not suitable to use for generation directly. First, most large-scale linguistic resources so far were built for language interpretation applications. They are indexed by words, whereas, an ideal generation lexicon should be indexed by the semantic concepts to be conveyed, because the input of a generation system is at semantic level and the processing during generation is based on semantic concepts, and because the mapping in the generation process is from concepts to words. Second, the knowledge needed for generation exists in a number of different resources, with each resource containing a particular type of information; they can not currently be used simultaneously in a system. In this paper, we present work in building a rich, large-scale, and reusable lexicon for generation by combining multiple, heterogeneous linguistic resources. The resulting lexicon contains syntactic, semantic, and lexical knowledge, indexed by senses of words as required by generation, including: A complete list of syntactic subcategorizations for each sense of a verb to support surface realization. A large variety of transitivity alternations for each sense of a verb to support paraphrasing. Frequency of lexical items and verb subcategorizations and also selectional constraints derived from a corpus to support lexical choice. Rich lexical relations between lexical concepts, including hyponymy, antonymy, and so on, to support lexical choice.
منابع مشابه
Integrating a Large-Scale, Reusable Lexicon with a Natural Language Generator
This paper presents the integration of a largescale, reusable lexicon for generation with the FUF/SURGE unification-based syntactic realizer. The lexicon was combined from multiple existing resources in a semi-automatic process. The integration is a multi-step unification process. This integration allows the reuse of lexical, syntactic, and semantic knowledge encoded in the lexicon in the devel...
متن کاملOpen-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies
This paper describes reusable, open-source tools for creation, maintenance, storage, and access of Language Resources (LR) needed for generating natural language texts from ontologies. One advantage of these tools is that they provide a user-friendly interface for NLG LR manipulation. They also provide unified models for accessing NLG lexicons and mappings between lexicons and ontologies.
متن کاملEnabling technology for multilingual natural language generation: the KPML development environment
Natural language generation is now moving away from research prototypes into more practical applications. Generation functionality is also being asked to play a more signi cant role in established applications such as machine translation. In both cases, multilingual generation techniques have much to o er. However, the take-up of multilingual generation is being restricted by a critical lack bo...
متن کاملCombining Language Resources Into A Grammar-Driven Swedish Parser
This paper describes work on a rule-based, open-source parser for Swedish. The central component is a wide-coverage grammar implemented in the GF formalism (Grammatical Framework), a dependently typed grammar formalism based on Martin-Löf type theory. GF has strong support for multilinguality and has so far been used successfully for controlled languages (Angelov and Ranta, 2009) and recent exp...
متن کاملComposition Decomposition LinearizationParsing Extraction Statistical Language Lexicon Source Language Lexicon Target Analysis Generation Realization Lexical Selection LCS Parse Word Lattice AMR Target
This paper describes a large-scale language-independent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reeecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two langu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998